Exploring the genetic diversity and population structure of aerial yams (Dioscorea bulbifera L.) DArT-seq and agronomic traits

Dioscorea bulbifera is an edible yam specie with aerial bulbils. Assessing the genetic diversity of D. bulbifera accession for cultivation and breeding purposes is essential for it genetic improvement, especially where the crop faces minimal attention. The aims of this study was to assess the genetic diversity of Dioscorea bulbifera accessions collected from Nigeria and accessions maintained at the genebank of International Institute of Tropical Agriculture (IITA) Ibadan. Accessions were profiled using quatitative and qualitative phenotypic traits and Diversity Array Technology SNP-markers. Multivariate analysis based phenotypic traits revealed high variability among the evaluated accessions and all phenotypic traits assessed were useful in discriminating the aerial yam accessions. Clustering analysis based phenotypic traits revealed the presence of two well defined clusters. Using DArT-Seq marker, the 94 accessions were classified into three genetic group through the admixture and the phylogeny analysis. The comparision of phenotypic and genotypic clustering revealed inconsistency membership across the two clustering methods. The study established a baseline for the selection of parental lines from the genetic groups for genetic improvement of the D. bulbifera.


Introduction
Yams (Dioscorea L.) belong to the family Dioscoreaceae.Dioscorea L. is the largest genus of the family Dioscoreaceae [1,2].Yams (Dioscorea spp.) are popular staples in West Africa [3], serving as important sources of dietary calories.They contributed, on average, more than 200 kilocalories per person per day to over 300 million people between 2006 and 2010 [4].The crop is widely adapted to several agro-ecologies in Nigeria, along with the ability of farmers to harvest their tubers at their convenience, makes it an essential food security crop [5].For millions of people, yam tubers form an integral part of social-cultural activities [6] and medicinal values [7].Nigeria ranks as the leading producer of yams (Dioscorea spp.) in the world, accounting for 65% (about 50.1 million tons) of annual global production [8].Though yams are grown for their carbohydrate content, their storage organs are important sources of vitamins for millions of people in West Africa, Southeast Asia, and the Caribbean [9][10][11].Many species also contain substantial amounts of vitamins such as carotene, thiamine, riboflavin, niacin and some minerals like calcium, phosphorus and iron [7,12,13].There are over 600 species of yams world wide, of which about 12 species of economic significance as food plants [14,15].
Dioscorea bulbifera (commonly known as the air potato, air yam) is native to Africa and Asia with Asian cultivars less angular, more spherical, and less toxic than African cultivars [5].D. bulbifera has been therapeutic potentials in the management of ailments [7,16].The cultivation of D. bulbifera in Nigeria has remained traditional and is typically carried out by smallholder farmers.The lack of adequate information on genetic diversity limits its genetic improvement.To develop elite genotypes that combine high and stable yield of good tuber quality combine with diseases and pests resistant, a wide range of genetic diversity is required.Studies using agro-morphological do not demonstrate the true genetic relatedness of the accessions and are strongly influenced by environmental factors [17,18].In Nigeria, there is limited information on genetic diversity status of D. bulbifera using molecular markers.Molecular markers have been utilized as powerful tools for the estimation of genetic diversity of many species with great success and accuracy because they are abundant and unaffected by environmental parameters [19].So far, markers used for assessing diversity in yams include restriction fragment length polymorphism (RFLP) of cpDNA [20], random amplified polymorphic DNA (RAPD) [21,22] amplified fragment length polymorphism (AFLP) markers [23,24], and SSR markers [25][26][27][28].Next-generation sequencing (DArTseq-based) has been successfully applied in germplasm characterization in white guinea yam [29][30][31][32] water yam [33,34], and in trifoliate yam [35], thus demonstrating its suitability for the high-throughput genotyping in yams.However, no genetic characterization study has been conducted on the D. bulbifera.This study aimed to better understanding of the genetic relationships and population structure of D. bulbifera in Nigeria.This will set the foundation for the crop improvement and development of breeding strategies in Nigeria.

Experimental site
The experiment was conducted in the field during the 2020 and 2021 planting seasons at the eastern farm of the National Root Crops Research Institute, Umudike, located at 5˚20' N latitude and 07˚33'E longitude.The yam varieties were planted in the month of May while the harvesting was done in January.The annual rainfall ranges from 1800 to 2200 mm with an average annual air temperature and relative humidity of 20.50˚C and 76.80%, respectively.Umudike soil is characterized by a well-drainedred to yellowish sandy loam to sandy clays occuring at the summits and upper slopes of the ridges.

Collection of planting materials
The planting materials used for this study consisted of 94 D. bulbifera accessions representing originated from different source.A total of 64 accessions were sourced from the genebank of International Institute of Tropical Agriculture (IITA) Ibadan, Nigeria.Thirty collections were also sourced from the National Root Crops Research Institute (NRCRI), Umudike, Nigeria (Table 1).The accessions were collected for research pupose only as agreed by the (NRCRI).The accessions were profiled for phenotypic traits and no extraits was tested on human.The study was conducted in the view of profiling D. bulbifera for breeding purpose.All activities was conducted in respect of the study design.All authors reviewed the manuscript and approved the submitted version.

Field establishment
The experimental design used was the randomized complete block design (RCBD).The planting was arranged in ridges with planting at the beginning of the rainy season.A single row plot was used for each accession, each row measuring 7m long with 1m spacing between rows and 1m spaces between plants within a row.Each accession was planted in two replication for one cropping season.This spacing regimen was done to avoid competition among neighbouring plants and to ensure sound establishment of each accession.Individual plants were supported by bamboo stakes.Standard agronomic practices such as manual weeding, with no irrigation, no chemical was adopted.Phenotypic data was collected on the five middle plants from each row.A total of 10 quantitatives and 5 qualitatives traits was used to profile the 94 accessions.

Phenotyping
The trais described in Table 2 were used to assess the agronomic performance of the 94 yam accessions.Data was recorded using the yam standard operational protocol developed by [36] with slight modification.

Genotyping
Yam leaf sampling and DNA extraction.Young, healthy and fully expanded fresh leaf samples were collected at two months after the all accessions were fully established.About three to five tender leaves, weighing more than 20mg were collected in well labelled bags containing silica-gel granules with a colour indicator.The leaf samples were stored in the silica-gel for 72 hours to remove the moisture.Subsequently, genomic DNA (gDNA) extraction was carried out at the Bioscience centre, International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria, using the CTAB procedure with slight modification [37].The DNA quality and concentration was ascertained by running the gDNA in a 1% agarose gel and on a NanoDrop 2000 spectrophotometer, following the methods described in [38].
SNP genotyping assay and SNP filtering.High quality DNA was sent to Diversity Array Technology (DArT) Australia for the sequencing.For the sequencing-based DArT genotyping, SNPs were called using complexity reduction methods optimized for yam at the DArT's proprietary software, DArTSoft, as described by [39].We aligned the raw reads to the yam reference genome [40].A total of 11,721 DArTseq SNP-derived markers were obtained as raw SNP markers and subjected to quality control (QC) to eliminate the unwanted SNP markers.For the QC implementation, software PLINK [41] version 1.9 and VCFtools [42] were used.Markers and genotypes with high missing value (>20%) were eliminated as well as SNP marker with low minor allele frequency (<5%) and low call rate.After quality filtering, 10,087 DArT-SNP markers distributed across 20 yams chromosomes were retained and used for downstream analysis.Genotypic data can be downloaded from the downloaded from the open access YamBase (www.yambase.org).
Phenotypic data analysis.Quantitative data collected were subjected to descriptive statistical analysis (minimum, maximum, average and standard error) using basic function in R. Principal component analysis (PCA) was performed using FactorMiner package [43].The PCA data was used to generate eigenvalues, cumulative variability, and load coefficient values.The principal components (PC) with eigenvalues > 1.0 were selected, and those traits that had load coefficients � 0.5 were considered relevant scores for the PC and considered as valuable traits for distinguishing between the genotypes [44].
Assessment of genetic diversity.Using VCFtools [42] and PLINK 1.9 [41] minor allele frequency (MAF), polymorphic information content (PIC), expected heterozygosity (He), and observed heterozygosity (Ho) were estimated.For Genetic diversity analysis we first estimated the optimal number of clusters by using k-means analysis [45,46] of PCA (principal component analysis)-transformed genome-wide SNP data by varying the possible number of clusters from 1 to 10. Population structure analysis was conducted through ADMIXTURE analysis using adegenet R package.Ancestry probability was used to determine the most appropriate K, and accessions with membership proportions (Q-value) � 60% were assigned to groups, while those with membership probabilitie s less than 60% were considered as admixt [47].To complement the the PCA and population structure, hierachical cluster (HC) analysis was conducted using Jaccard dissimilarity matrix.Clusters generated for both genotypic and phenotypic were compared using dendextend package.

Variability based phenotypic data
The result of the phenotypic statistics, including mean, minimum, range, standard deviation and maximum are shown in Table 3. High phenotypic variation was observed for quantitative traits.The accessions exhibited significant variability for most of the traits assessed.Trait such bulbil width displayed wide range from 3 to 8 with 5 as average while the number of branches on main stem ranged from 3 to 17 with 9.50 as average value.Similarly.For the yam leaf length, it ranged from 10 to 16 cm among the accessions with a mean value of 13.59cm.
Through the PCA, the first three components explained 73.31% of the total variation (Table 4).The first principal component (PC1) had an Eigen value of 9.17, contributing to 57.11% of the total variation.On the first component, all evaluated variables displayed high variation except yam leaf margin color and the yam anthracnose disease.On the second principal component which explained 8.80 of the total vaiation the leaf margin color and the anthracnose disease were identified to be highly associated.
The influence of the traits on the principal components and the levels of correlation among them are presented in Fig 1 .Traits like Leaf shape (LS), Bulbils shape (BS), Plant height (PH), Number of stem per plant (SPP), Number of branches on main stem (NBMS), Llengh.Lwidth, non marketable (Non.MKT), leaf length (LLENGH), leaf width (LWIDTH) and Total number of bubils (TNB) were positively associated while traits like LH, LAS, BDia, Blenght, Bwidth) were all negatively associated.
Correlation analysis revealed positive correlation among several of the evaluated traits, the vine length displayed positive corelation with the stem per plant and with the number of the branches per stem.Also, and through the correlation analysis positive colleration was reported between the leaf length, the number of branches per plant, the stem number per plant and the vine length while negative correlation was observed between the stem diameter and the number of stem per plant (Fig 2).

Phenotypic classification
Using 16 traits, the evaluated yam accessions were grouped into 2 major groups (Fig 3).The first cluster (green) was made of only accessions collected from IITA genebank.Accessions of this cluster are characterized with low yield.The second cluster (red) was made with accessions collected from different farmers field across the South East of Nigeria.Accession of this cluster were identified to have high number of bulbils with large tubers.

Summary statictic based SNP markers
A total of 11,721 SNP markers was initially generated, from which 10,087 was retained after the removal of low-quality SNP markers.The SNP markers were unequally distributed across the twenty (20) chromosomes of D. bulbifera (Table 5).The highest number of SNPs (516) was mapped on chromosome 10 and the lowest number of SNPs (489) was mapped on chromosome 15 (Table 5).Transition SNPs (62.39%, 6293 SNPs) were more frequent than transversions SNPs (37.61%, 3794 SNPs).The C/T transitions (31.14%) accounted for the highest frequency, while C/G transversions (6.77%) occurred at the lowest frequency (Fig 4).The average polymorphic information content (PIC) value across all the markers was 0.268, it ranged

Population structure
Through the Bayesian Information Criterion (BIC) and complementary coordination analysis, three clusters was identified as the optimum number of the possible group number (Fig 6).The 94 accessions were classified into three (3) main groups (Fig 7) through phylogenic analysis.The first cluster (red), which were all made up of IITA genebank accessions.A larger number of accessions were in the second cluster (green) containing forty five (45)

Clustering based phenotypic and genotypic data
Clustering based on phenotypic data revealed the presence of two major clusters, whereas molecular marker analysis grouped the evaluated yam accessions into three distinct clusters.When comparing the clustering methods, none of the yam accessions were positioned in the same cluster, given the 0.29 entanglement threshold (Fig 9) comparision of genotypic and phenotypic.

Discussion
In this study, we evaluated the level of the genetic diversity among 94 accessions of D. bulbifera using both phenotypic and molecular markers.A better understanding of the existing aerial yam germplasm is one of the prerequisites for breeding new genotypes with novel or improved   characteristics.The principal component derived from the study of these traits displayed 73.17% of the total phenotypic on the first three components coupled with large differences between minimum and maximum confers the strength of these traits for traits profiling.Observed variability could be as a result of sexual recombination [48], and natural mutation and long term selection occurring in the course of its ongoing domestication process in some agro-ecological zones [49][50][51][52].Despite the potential of phenotypic traits in diversity studies, their expression may be partly subjected to environmental variation, thus, providing limited genetic information [18].The DArTseq genotyping detected a total of 10,087 informative SNPs, which were unequally distributed among and within the 20 aerial yam chromosomes at various densities.The average PIC value of 0.27 obtained in the present study is higher than previous reports [53] but comparable to some other studies [54][55][56].This shows the informativeness of the SNP markers used in this study.
Through structure and phylogenetic tree analyses, the 94 aerial yam accessions used in this study were classified into three subgroups.For all the subgroups, both accessions obtained from IITA genebank and the eastern part of Nigeria were well distributed and showed high correspondence in clustering patterns between the different grouping methods.Similar results were observed in Dioscorea alata [35].The three subgroups of aerial yam accessions was observed to exhibit a low level of admixture.These observed genetic divergence with a low admixture level is due to original differences in domestication and subsequent vegetative propagation by farmers.Yam clones in farmer's field are genetically homogenous with negligible recombination rates, as farmers select their planting material from tubers and not from botanical seed.However, the molecular evidence on ennobled cultivars showed that the tubers collected by farmers from wild environments are often a mixture of wild and interspecific hybrids [40,57,58].This could partly explain the origin of the genetic admixture identified among the aerial yam accessions.Harnessing the advantages of phenotypic and molecular markers improves the grouping of entries in a germplasm collection [59], which provides a piece of valuable base information for parental selection to realize and sustain genetic gain.To our knowledge, our study represents the first high throughput genotyping of D. bulbifera and thus highlights the valuable prospect of utilizing DArTseq-SNP markers for molecular genetic studies in yams.Our findings buttress the fact that there are in and across country related variations.The studied genotypes were collected from different countries and diverse locations in Nigeria.The selected SNP markers resulted to three cluster groups.Two independent cluster groups comprised mainly individuals from IITA genebank while a third cluster comprised mostly of cultivars obtained from South Eastern states of Nigeria.The trend in variability infers evolutionary and admixture relationship as reported by Adjei [31].Cluster II had individuals from different countries suggesting some sort of clonal dispersion though trade and migration within the African sub region.We observed negligible association between the phenotypic and genotypic data which could be a result of high variation in phenotypic traits.In a similar study, Agre et al. [32,34] reported similar result on D. alata and D. rotundata in a study conducted in Nigeria and Benin.The low correlation between morphological and molecular data in the D. bulbufera accessions generally suggests that the two data types are appropriate for a combined use, which can deepen understanding and discriminate the genotypes better due to the non-overlapping information.
This implies the existence of a robust genepool from which breeding programs can thrive to make selection and breeding advance.It is noteworthy to highlight that breeding programs on yam has prioritised D. rotundata and D.alata in the past.This conscious effort sets the very foundation for D. bulbifera crop improvement strategy.

Conclusion
In this study, we investigated the genetic diversity of Dioscorea bulbifera using agronomic phenotypic and DArTseq SNP markers.This study revealed the presence of moderate genetic diversity among the evaluated genotypes.Genetic diversity through the population structure and hierarchical clustering analysis displayed same genetic pattern in the grouping.We provided as well the relevance of combining phenotypic data with molecular markers for proper genetic profiling.

Fig 5 .
Fig 5. Histogram showing the distribution of SNPs markers associated with D.bulbifera genome.A) Distribution of expected heterozygosity in the genome B) Distribution of observed heterozygosity in the genome.C) Distribution of Minor allele frequency in the genome.D) Distribution of polymorphic information content.https://doi.org/10.1371/journal.pone.0306631.g005